The blogosphere as an excitable social medium: Richter's and Omori's Law in media 

coverage 



Peter Kliniek^''^, Werner Bayer*^, Stefan Thurner'^''^''*'* 

"■IIASA, Schlossplatz 1, A 2361 Laxenburg; Austria; 
Section for Science of Complex Systems, Medical University of Vienna, Spitalgasse 23, A 1090 Vienna; Austria; 
"Santa Fe Institute; 1399 Hyde Park Road; Santa Fe; NM 87501; USA 



'Abstract 



yVe study the dynamics of pubHc media attention by monitoring the content of onHne blogs. Social and media events can 



be traced by the propagation of word frequencies of related keywords. Media events are classified as exogenous - where 
.blogging activity is triggered by an external news item - or endogenous where word frequencies build up within a blogging 
community without external influences. We show that word occurrences show statistical similarities to earthquakes. The 
'^—^ ^ize distribution of media events follows a Gutenberg-Richter law, the dynamics of media attention before and after the 
media event follows Omori's law. We present further empirical evidence that for media events of endogenous origin the 
I— ipverall public reception of the event is correlated with the behavior of word frequencies at the beginning of the event, 
and is to a certain degree predictable. These results may imply that the process of opinion formation in a human society 



might be related to effects known from excitable media. 
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. ^ ' Modern man is exposed to a constant stream of news 
which we resorb, read, digest, discuss, disseminate and for- 
^j^'get. Most news items are of relatively little impact. They 
receive a small amount of public attention over a short 
I— I time and quickly descend into public oblivion. However, 
occasionally news reports have a massive impact so that 
^ they can overthrow public opinions, heralded beliefs and 
even governments. A recent example are the 2010-2011 
Q\ Tunisian protests being sparked by reports of police use 
of tear gas against young demonstrators. Of course, the 
reasons for these protests go far beyond this single inci- 
04 jdent, they involve multiple political, social and economi- 
cal dimensions. It is remarkable how reports about such 
^ I felatively small incidents propagate under certain circum- 
. . stances through a society - i.e. a strongly interconnected 
^ system - and trigger a nation-wide revolution, while un- 
der under circumstances the news will practically vanish 
unheard. It is tempting to see such media events as a hu- 
man, social excitable medium. One may view them as a 
social analog to earthquakes [l], Q . External stimuli trig- 
ger relaxation events where accumulated "energy" is dis- 
charged or spread within a complex, networked system. 
This phenomenon can also be observed in other excitable 
media such as the brain Q, oscillating chemical reactions 
like the Belousov-Zhabotinsky reaction Q or the Mexican 
wave (or La Ola) at sport events Q. 

Are there quantitative patterns in the way societies re- 
act to the arrival of breaking news? How can the impact 



1. Introduction 



X 



of a news items be quantified? How long and with which 
intensity do people devote their attention to current news? 
Such research questions become amenable to quantitative 
study through online news discussions. In recent years 
weblogging (or blogging) has emerged as a new publish- 
ing medium at the grassroots level of society. A blog is 
usually defined as a web page with entries listed in reverse 
chronological order, maintained by one or several writers. 
Blogs are often devoted to a unique topic, such as poli- 
tics, finance or sports, and provide news or commentaries, 
often several times per day. In fall 2007 the blog search en- 
gine technorati.com stopped tracking the number of active 
blogs once it exceeded 100 million. This rapid develop- 
ment sparked academic interest. The dynamical evolution 
of the blogosphere, defined as the collection of all blogs at 
a given time, has been studied from a network perspec- 
tive @ , where nodes are blogs which are connected if one 
blog possesses a hyper-link or URL to the other one. The 
spreading of news items can be seen as a diffusion process 
on this network [t^ . The accelerating shift of human con- 
versations and discussions toward online media like blogs, 
twitter, etc. has reached a point where web data mining 
techniques [1] can be used to e.g. predict spikes of sales 
ranks of books , measure public sentiments towards so- 
cietal issues [l^ or predict stock market movements 11 1 . 

In this work we study the impact of news reports on a 
small segment of the blogosphere by looking at timeseries 
of word frequencies. The sample contains blogs covering 
US domestic politics over a period of nearly two years. 
Blogs are typically not the first to report a story. They 
pick up developing news topics and offer comments and 
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subjective points of view. They are therefore an ideal can- 
didate to measure how the pubhc 'digests' news. We show 
that the pubhc reception of news reports foUow a similar 
statistic as earthquakes do. The intensity of fore- and af- 
tershocks can be described by a power law in analogy to 
Omori's law the size distribution of media events fol- 
lows a Gutenberg- Richter law It has been reported 
previously that Omori's law holds for the round-trip times 
of data packages in the internet 14|. Power law signa- 
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tures in human activity have recently been found in the 
distributions of waiting times between a catastrophe and 
humanitarian responses [TH'l or between contributions to a 
discussion in online forums ^16i] . 

2. Data and Methods 

A dataset of 168 large and massively popular blogs de- 
voted to US domestic politics covering the entire political 
spectrum was compiled. The blog authors include popu- 
lar political journalists (e.g. Glenn Beck or Taylor Marsh) 
as well as a number of self-pronounced political commen- 
tators labeling themselves everything from 'far right' to 
'liberal curmudgeons'. We recorded the content of each 
blog entry with a time stamp with a resolution of one sec- 
ond. We focus on the time period of 670 days between 
July 1"* 2008 and May 3'''^ 2010. As the dominant themes 
this period contains the 2008 US presidential elections, the 
advent of Sarah Palin, the health care reform and the Iraq 
torture scandal . We developed a proprietary web crawler 
that continually crawls specified websites. The crawler can 
be targeted to a web page of a blog containing a blog entry 
of a given date. The content of this site is automatically 
parsed and checked for hyper-links to other blog entries. 
If a new link is found the linked page is downloaded and 
stored. The software then analyzes the structure of the 
blog entry, identifying and storing its date, headline, and 
entire content. Once all links in the web page containing 
the blog entry have been identified the initial blog page is 
searched for a link to another web page containing older 
entries of the same blog. This procedure is repeated as long 
as older entries are found. The crawler is implemented in 
the Java programming language in a fully object oriented 
fashion. All collected data is stored in a SQL database al- 
lowing efficient subsequent sorting and filtering. Filtering 
and pre-precessing was done by removal of all short high 
frequency words e.g. 'the' or 'and', and removal of plural-s 
and other frequent endings such as 'ing' or 'ed'. 

We will refer to words surrounding political topics ac- 
tively discussed in the blogosphere by 'keywords'. They 
were extracted by the following procedure. The frequency 
of aU 26'^ = 17576 possible letter triplets (that is 'aaa', 
'aab', . . . , 'zzz') was counted over all blog entries for each 
day. Nearly 50% of these triplets where discarded since 
they never occurred. From the remaining triplets we ex- 
tracted the time series of daily frequencies and looked at 
their co-occurrences with words. In particular, for each 
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Figure 1: (a): Word frequencies of 'palin' before and after the nom- 
ination of Sarah Pahn as vice presidential candidate in the 2008 
presidential elections. The x-axis is time (days) before and after the 
event, the event itself is indicated by a vertical dashed line. Note the 
absence of pre-cursory activity, (b): Word frequencies of 'inaugura- 
tion' before and after the inauguration of Barack Obama. There is 
pronounced prc-cursory growth. Insets: same in log-log scale. 



remaining triplet we searched for the day with the high- 
est frequency, assume this day is tQ. For this day a list of 
all words containing the triplet was made. For each word 
from this list the time series of daily word frequencies in 
the range between thirty days before and after to was ex- 
tracted. The word was kept for further analyses only if 
its daily word frequency has a maximum value within this 
range at tQ. This procedure left us with approximately 
4000 keywords. 

Let Wi(t) be the word frequency of keyword i at time t. 
An event is (somewhat arbitrarily) identified as a strong 
increase in word frequencies over a short period of time, 
with a subsequent decay to (usually) the previous levels. 
The time at which the peak occurs within an event is tg. 
The event-size E is defined by the peak level of word fre- 
quency relative to the average level within a timespan T 
before and after the peak level during the event. We work 
with T = 30. The event-size for word i at peak-time t^ is 
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Two classes of response functions in social systems have 
been previously identified: endogenous and exogenous, see 
e.g. [121 . Endogenous events start with a phase of pre- 
cursory growth, exogenous ones by a sudden burst in word 
frequencies. Both types of events are followed by a relax- 
ation process after the peak is reached. It usually follows 
a power law. 

An example for an exogenous event is the nomination 
of Sarah Palin as vice presidential candidate in the 2008 
US presidential elections. Fig. [T](a), shows the word fre- 
quencies of 'palin' before and after her nomination. The 
dashed line indicates the peak in word frequency, a sudden 
jump is followed by a relaxation process. As an example 
for an endogenous process consider the word frequencies of 
'inauguration' before and after the inauguration of Barack 
Obama, Fig. [TJb). The day of the inauguration itself here 
coincides with t^ = 0, the x-axis shows the days before 
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Figure 2: Cummulative distribution Pr{E* > E) of event size E for 
exogenous and endogenous events. The exogenous case (blue circles, 
dotted line) can be fitted by a Gutenberg-Richter law with exponent 
fiexo = 1.00(3) (green solid line). The exponent for the endogenous 
event sizes (red squares, dashed line) is f^exo = 0.57(4) (black dash- 
dotted line), close to the typical value for earthquakes. 



and after the inauguration. There is a phase of pre-cursory 
growth culminating at to which is fohowed by a relaxation 
process. 



It has been suggested [17| that word frequencies before 
and after an endogenous event at time to follow power laws 
of the form 



Wi{t < to) 
Wi(t > to) 



{to - t)-"^ 
(t-to)-"-' 



(2) 



where in general the growth exponents before the peak at 
to, cxg, and the decay exponent after to, ad, can be differ- 
ent. For the exogenous case there is no clear functional 
form for the word frequencies before to , the relaxation dy- 
namics often is of power law type fl7t . 



Wi(t > to) oc (t - to)~ 



(3) 



Exogenous events - translated into a seismological lan- 
guage - can be identified with the Omori law where Wi (t > 
to) is interpreted as the analog to the aftershock rate. The 
idea is that the frequencies of mentionings of a keyword 
before (after) to can be regarded as foreshock (aftershock) 
rates of that particular event. For endogenous events we 
also observe 'foreshocks' following an Omori law. 

For each event in a timeseries of word frequencies in 
our database it was checked whether it is endogenous or 
exogenous. For this we segmented each timeseries (often 
containing several events) into windows of T days and lo- 
cated the local maxima within this timespan0. The event 
was classified as endogenous or exogenous if the Residual 
Sum of Squares for the fit was below 0.15, computed by fits 
to Eqs. (21) and ^ around to, and normalized by Wi{to). 



^To fit a power growth or decay we required at least non-zero word 
frequencies over two weeks before and after tg. The range of the fit 
was chosen between {to, to ±t) with t G {14, . . . , 30}. For each value 
of r in this range the Akaike Information Criterion was computed. 
The exponent of the fit with minimal value of this criterion was used 
for further analyses. 
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Figure 3: Cumulative distribution functions of 7 (blue dotted line), 
Qg (red solid) and aa (green dashed). The similar curves for 7 and 
Ojj suggest a universal decay law of public attention for media events. 
The exponents for pre-cursory growth Og are typically higher than 
the decay exponents a^. 



3. Results 

With the above procedure we ended up with approx- 
imately 150 endogenous and 1000 exogenous events, i.e. 
an average of 0.2 endogenous and 1.5 exogenous events 
per day. Note that exogenous events are about an order of 
magnitude more frequent. One media event generally cor- 
responds to events in several word frequencies of related 
keywords. This effect should be the same for endogenous 
and exogenous types. 

The Gutenberg-Richter law is an empirical power law 
describing the frequency of earthquakes of a given radiated 
seismic energy Eg, 



PiiE* > E,) cx E-'^^+^^ 



(4) 



with a typical exponent of /3 « 2/3 (T^]. Figure [5] shows 
the cumulative distribution function (CDF) Pr(i?* > E) 
for endogenous and exogenous events. In both cases we 
find a Gutenberg-Richter law 
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For endogenous events we find Pendo = 0.57(4), for exoge- 
nous Pexo — 1.00(3). Together with the relative high num- 
ber of exogenous compared to endogenous events, this sug- 
gests that exogenous events share characteristics of 'earth- 
quake swarms' [l8l | - sequences of small earthquakes over 
a relatively short time period. 

The cumulative distribution functions for 7 for exoge- 
nous events and ad and ag for endogenous events are 
shown in Fig. [31 The value of the CDF of 7* at any given 
value 7 is defined as the probability that 7* is greater than 
7, that is Pr(7* > 7). Exponents 7 and ad follow almost 
the same distribution function. This might suggest a uni- 
versal decay law of public attention to media events. For 
high values of exponents the CDF of growth exponents 
ag is larger than the distribution function for decay ex- 
ponents. In general we find for a given exponent value 
X 

Pr(x > ag) > Pr(x > ad) w Pr(x > 7) • (6) 




Figure 4: Scatter plot of Og versus a^. Each point is one endogenous 
event with x- and j/-coordinates given by the values of the a's in Eqns. 
[2l The dashed line is a regression line, we find a correlation coefficient 
of p Si 0.67 and a p-value of 10~^^ against the null hypothesis of no 
correlation. 



To what extent can one predict the dynamics of en- 
dogenous events from their pre-cursory growth? Fig]?] 
shows for each endogenous event its growth Ug versus its 
decay exponent ad- Especially for low values of the expo- 
nents (smaller than e.g. one) there is a clear correlation 
between them. Even if we include the entire dataset of 
endogenous events we can reject the null hypothesis of no 
correlation up to a p-value of 10^^^. The correlation coef- 
ficient is p « 0.67. This correlation should drastically in- 
crease if one would only include events with small growth 
and/or decay exponents a. 

4. Conclusions 

Blogs offer a new and exciting possibility to study col- 
lective human behavior on a quantitative basis. The blo- 
gosphere is a highly connected virtual space where people 
with different background and attitude disseminate and 
discuss issues that caught their attention and sufficient 
interest, in a way that can be exploited for quantitative 
studies. We empirically studied how people react to new 
pieces of information through dynamical patterns of their 
blogging behavior. We see the blogosphere is a human, 
excitable social medium in which waves of collective ex- 
citement can be traced. We studied event-size, foreshock 
and aftershock distributions in this medium and noticed 
analogies to seismology. The intensity of fore- and after- 
shocks follows Omori's law, the distribution of event-sizes 
is of Gutenberg-Richter type. We presented indications 
that there exist significant correlations between the dy- 
namics of fore- and aftershocks. One might also think 
of a 'Richter scale' for media events. The largest event 
recorded in our dataset is the nomination of Sarah Palin 
as vice presidential candidate. Indeed, aftershocks of this 
event are still trembling and quivering through our society. 
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